Exploratory Data Analysis of Loan Data from Prosper Barbara Stempien ========================================================

Prosper Marketplace is America’s first peer-to-peer lending marketplace, with over $7 billion in funded loans. Borrowers request personal loans on Prosper and investors (individual or institutional) can fund anywhere from $2,000 to $35,000 per loan request. Investors can consider borrowers’ credit scores, ratings, and histories and the category of the loan. Prosper handles the servicing of the loan and collects and distributes borrower payments and interest back to the loan investors.

Prosper verifies borrowers’ identities and select personal data before funding loans and manages all stages of loan servicing. Prosper’s unsecured personal loans are fully amortized over a period of three or five years, with no pre-payment penalties. Prosper generates revenue by collecting a one-time fee on funded loans from borrowers and assessing an annual loan servicing fee to investors.

Prosper publishes performance statistics on its website and all market data is available to the public for analysis. Prosper loan data set contains 113,937 loans with 81 variables on each loan, including loan amount, borrower rate (or interest rate), current loan status, borrower income, borrower employment status, borrower credit history, and the latest payment information.

Univariate Plots Section

We will begin the analysis by looking at the data contained in the data set.

##                ListingKey ListingNumber           ListingCreationDate
## 1 1021339766868145413AB3B        193129 2007-08-26 19:09:29.263000000
## 2 10273602499503308B223C1       1209647 2014-02-27 08:28:07.900000000
## 3 0EE9337825851032864889A         81716 2007-01-05 15:00:47.090000000
## 4 0EF5356002482715299901A        658116 2012-10-22 11:02:35.010000000
## 5 0F023589499656230C5E3E2        909464 2013-09-14 18:38:39.097000000
## 6 0F05359734824199381F61D       1074836 2013-12-14 08:26:37.093000000
##   CreditGrade Term LoanStatus          ClosedDate BorrowerAPR BorrowerRate
## 1           C   36  Completed 2009-08-14 00:00:00     0.16516       0.1580
## 2        <NA>   36    Current                <NA>     0.12016       0.0920
## 3          HR   36  Completed 2009-12-17 00:00:00     0.28269       0.2750
## 4        <NA>   36    Current                <NA>     0.12528       0.0974
## 5        <NA>   36    Current                <NA>     0.24614       0.2085
## 6        <NA>   60    Current                <NA>     0.15425       0.1314
##   LenderYield EstimatedEffectiveYield EstimatedLoss EstimatedReturn
## 1      0.1380                      NA            NA              NA
## 2      0.0820                 0.07960        0.0249         0.05470
## 3      0.2400                      NA            NA              NA
## 4      0.0874                 0.08490        0.0249         0.06000
## 5      0.1985                 0.18316        0.0925         0.09066
## 6      0.1214                 0.11567        0.0449         0.07077
##   ProsperRating..numeric. ProsperRating..Alpha. ProsperScore
## 1                      NA                  <NA>           NA
## 2                       6                     A            7
## 3                      NA                  <NA>           NA
## 4                       6                     A            9
## 5                       3                     D            4
## 6                       5                     B           10
##   ListingCategory..numeric. BorrowerState    Occupation EmploymentStatus
## 1                         0            CO         Other    Self-employed
## 2                         2            CO  Professional         Employed
## 3                         0            GA         Other    Not available
## 4                        16            GA Skilled Labor         Employed
## 5                         2            MN     Executive         Employed
## 6                         1            NM  Professional         Employed
##   EmploymentStatusDuration IsBorrowerHomeowner CurrentlyInGroup
## 1                        2                True             True
## 2                       44               False            False
## 3                       NA               False             True
## 4                      113                True            False
## 5                       44                True            False
## 6                       82                True            False
##                  GroupKey              DateCreditPulled
## 1                    <NA> 2007-08-26 18:41:46.780000000
## 2                    <NA>           2014-02-27 08:28:14
## 3 783C3371218786870A73D20 2007-01-02 14:09:10.060000000
## 4                    <NA>           2012-10-22 11:02:32
## 5                    <NA>           2013-09-14 18:38:44
## 6                    <NA>           2013-12-14 08:26:40
##   CreditScoreRangeLower CreditScoreRangeUpper FirstRecordedCreditLine
## 1                   640                   659     2001-10-11 00:00:00
## 2                   680                   699     1996-03-18 00:00:00
## 3                   480                   499     2002-07-27 00:00:00
## 4                   800                   819     1983-02-28 00:00:00
## 5                   680                   699     2004-02-20 00:00:00
## 6                   740                   759     1973-03-01 00:00:00
##   CurrentCreditLines OpenCreditLines TotalCreditLinespast7years
## 1                  5               4                         12
## 2                 14              14                         29
## 3                 NA              NA                          3
## 4                  5               5                         29
## 5                 19              19                         49
## 6                 21              17                         49
##   OpenRevolvingAccounts OpenRevolvingMonthlyPayment InquiriesLast6Months
## 1                     1                          24                    3
## 2                    13                         389                    3
## 3                     0                           0                    0
## 4                     7                         115                    0
## 5                     6                         220                    1
## 6                    13                        1410                    0
##   TotalInquiries CurrentDelinquencies AmountDelinquent
## 1              3                    2              472
## 2              5                    0                0
## 3              1                    1               NA
## 4              1                    4            10056
## 5              9                    0                0
## 6              2                    0                0
##   DelinquenciesLast7Years PublicRecordsLast10Years
## 1                       4                        0
## 2                       0                        1
## 3                       0                        0
## 4                      14                        0
## 5                       0                        0
## 6                       0                        0
##   PublicRecordsLast12Months RevolvingCreditBalance BankcardUtilization
## 1                         0                      0                0.00
## 2                         0                   3989                0.21
## 3                        NA                     NA                  NA
## 4                         0                   1444                0.04
## 5                         0                   6193                0.81
## 6                         0                  62999                0.39
##   AvailableBankcardCredit TotalTrades TradesNeverDelinquent..percentage.
## 1                    1500          11                               0.81
## 2                   10266          29                               1.00
## 3                      NA          NA                                 NA
## 4                   30754          26                               0.76
## 5                     695          39                               0.95
## 6                   86509          47                               1.00
##   TradesOpenedLast6Months DebtToIncomeRatio    IncomeRange
## 1                       0              0.17 $25,000-49,999
## 2                       2              0.18 $50,000-74,999
## 3                      NA              0.06  Not displayed
## 4                       0              0.15 $25,000-49,999
## 5                       2              0.26      $100,000+
## 6                       0              0.36      $100,000+
##   IncomeVerifiable StatedMonthlyIncome                 LoanKey
## 1             True            3083.333 E33A3400205839220442E84
## 2             True            6125.000 9E3B37071505919926B1D82
## 3             True            2083.333 6954337960046817851BCB2
## 4             True            2875.000 A0393664465886295619C51
## 5             True            9583.333 A180369302188889200689E
## 6             True            8333.333 C3D63702273952547E79520
##   TotalProsperLoans TotalProsperPaymentsBilled OnTimeProsperPayments
## 1                NA                         NA                    NA
## 2                NA                         NA                    NA
## 3                NA                         NA                    NA
## 4                NA                         NA                    NA
## 5                 1                         11                    11
## 6                NA                         NA                    NA
##   ProsperPaymentsLessThanOneMonthLate ProsperPaymentsOneMonthPlusLate
## 1                                  NA                              NA
## 2                                  NA                              NA
## 3                                  NA                              NA
## 4                                  NA                              NA
## 5                                   0                               0
## 6                                  NA                              NA
##   ProsperPrincipalBorrowed ProsperPrincipalOutstanding
## 1                       NA                          NA
## 2                       NA                          NA
## 3                       NA                          NA
## 4                       NA                          NA
## 5                    11000                      9947.9
## 6                       NA                          NA
##   ScorexChangeAtTimeOfListing LoanCurrentDaysDelinquent
## 1                          NA                         0
## 2                          NA                         0
## 3                          NA                         0
## 4                          NA                         0
## 5                          NA                         0
## 6                          NA                         0
##   LoanFirstDefaultedCycleNumber LoanMonthsSinceOrigination LoanNumber
## 1                            NA                         78      19141
## 2                            NA                          0     134815
## 3                            NA                         86       6466
## 4                            NA                         16      77296
## 5                            NA                          6     102670
## 6                            NA                          3     123257
##   LoanOriginalAmount LoanOriginationDate LoanOriginationQuarter
## 1               9425 2007-09-12 00:00:00                Q3 2007
## 2              10000 2014-03-03 00:00:00                Q1 2014
## 3               3001 2007-01-17 00:00:00                Q1 2007
## 4              10000 2012-11-01 00:00:00                Q4 2012
## 5              15000 2013-09-20 00:00:00                Q3 2013
## 6              15000 2013-12-24 00:00:00                Q4 2013
##                 MemberKey MonthlyLoanPayment LP_CustomerPayments
## 1 1F3E3376408759268057EDA             330.43            11396.14
## 2 1D13370546739025387B2F4             318.93                0.00
## 3 5F7033715035555618FA612             123.32             4186.63
## 4 9ADE356069835475068C6D2             321.45             5143.20
## 5 36CE356043264555721F06C             563.97             2819.85
## 6 874A3701157341738DE458F             342.37              679.34
##   LP_CustomerPrincipalPayments LP_InterestandFees LP_ServiceFees
## 1                      9425.00            1971.14        -133.18
## 2                         0.00               0.00           0.00
## 3                      3001.00            1185.63         -24.20
## 4                      4091.09            1052.11        -108.01
## 5                      1563.22            1256.63         -60.27
## 6                       351.89             327.45         -25.33
##   LP_CollectionFees LP_GrossPrincipalLoss LP_NetPrincipalLoss
## 1                 0                     0                   0
## 2                 0                     0                   0
## 3                 0                     0                   0
## 4                 0                     0                   0
## 5                 0                     0                   0
## 6                 0                     0                   0
##   LP_NonPrincipalRecoverypayments PercentFunded Recommendations
## 1                               0             1               0
## 2                               0             1               0
## 3                               0             1               0
## 4                               0             1               0
## 5                               0             1               0
## 6                               0             1               0
##   InvestmentFromFriendsCount InvestmentFromFriendsAmount Investors
## 1                          0                           0       258
## 2                          0                           0         1
## 3                          0                           0        41
## 4                          0                           0       158
## 5                          0                           0        20
## 6                          0                           0         1
##          LoanStatus    ListingCreationDate    ClosedDate        
##  Current      :56576   Min.   :2005-11-09   Min.   :2005-11-25  
##  Completed    :38074   1st Qu.:2008-09-19   1st Qu.:2009-07-14  
##  Charged Off  :11992   Median :2012-06-16   Median :2011-04-05  
##  Defaulted    : 5018   Mean   :2011-07-08   Mean   :2011-03-07  
##  Past Due     : 2067   3rd Qu.:2013-09-09   3rd Qu.:2013-01-30  
##  Final Payment:  205   Max.   :2014-03-10   Max.   :2014-03-10  
##  Cancelled    :    5                        NA's   :58848       
##  ListingCategory..numeric.           ListingCategory  Term      
##  Min.   : 0.000            Debt Consolidation:58308   12: 1614  
##  1st Qu.: 1.000            Not Available     :16965   36:87778  
##  Median : 1.000            Other             :10494   60:24545  
##  Mean   : 2.774            Home Improvement  : 7433             
##  3rd Qu.: 3.000            Business          : 7189             
##  Max.   :20.000            Auto              : 2572             
##                            (Other)           :10976             
##   BorrowerRate    LoanOriginalAmount MonthlyLoanPayment BorrowerState     
##  Min.   :0.0000   Min.   : 1000      Min.   :   0.0     Length:113937     
##  1st Qu.:0.1340   1st Qu.: 4000      1st Qu.: 131.6     Class :character  
##  Median :0.1840   Median : 6500      Median : 217.7     Mode  :character  
##  Mean   :0.1928   Mean   : 8337      Mean   : 272.5                       
##  3rd Qu.:0.2500   3rd Qu.:12000      3rd Qu.: 371.6                       
##  Max.   :0.4975   Max.   :35000      Max.   :2251.5                       
##                                                                           
##  IsBorrowerHomeowner                    Occupation   
##  Mode :logical       Other                   :28617  
##  FALSE:56459         Professional            :13628  
##  TRUE :57478         Computer Programmer     : 4478  
##                      Executive               : 4311  
##                      Teacher                 : 3759  
##                      Administrative Assistant: 3688  
##                      (Other)                 :55456  
##       EmploymentStatus StatedMonthlyIncome DebtToIncomeRatio
##  Employed     :67322   Min.   :      0     Min.   : 0.0000  
##  Full-time    :26355   1st Qu.:   3200     1st Qu.: 0.1300  
##  Not available: 7602   Median :   4667     Median : 0.2100  
##  Self-employed: 6134   Mean   :   5608     Mean   : 0.2552  
##  Other        : 3806   3rd Qu.:   6825     3rd Qu.: 0.3100  
##  Part-time    : 1088   Max.   :1750003     Max.   :10.0100  
##  (Other)      : 1630                                        
##  ProsperRating..Alpha.  ProsperScore   OpenCreditLines 
##  C      :18345         0      :29084   Min.   : 0.000  
##  B      :15581         4      :12595   1st Qu.: 5.000  
##  A      :14551         6      :12278   Median : 8.000  
##  D      :14274         8      :12053   Mean   : 8.642  
##  E      : 9795         7      :10597   3rd Qu.:12.000  
##  (Other):12307         5      : 9813   Max.   :54.000  
##  NA's   :29084         (Other):27517                   
##  TotalCreditLinespast7years OpenRevolvingAccounts
##  Min.   :  0.00             Min.   : 0.00        
##  1st Qu.: 17.00             1st Qu.: 4.00        
##  Median : 25.00             Median : 6.00        
##  Mean   : 26.59             Mean   : 6.97        
##  3rd Qu.: 35.00             3rd Qu.: 9.00        
##  Max.   :136.00             Max.   :51.00        
##                                                  
##  OpenRevolvingMonthlyPayment CurrentDelinquencies AmountDelinquent  
##  Min.   :    0.0             Min.   : 0.0000      Min.   :     0.0  
##  1st Qu.:  114.0             1st Qu.: 0.0000      1st Qu.:     0.0  
##  Median :  271.0             Median : 0.0000      Median :     0.0  
##  Mean   :  398.3             Mean   : 0.5884      Mean   :   918.6  
##  3rd Qu.:  525.0             3rd Qu.: 0.0000      3rd Qu.:     0.0  
##  Max.   :14985.0             Max.   :83.0000      Max.   :463881.0  
##                                                                     
##  DelinquenciesLast7Years PublicRecordsLast10Years RevolvingCreditBalance
##  Min.   : 0.000          Min.   : 0.0000          Min.   :      0       
##  1st Qu.: 0.000          1st Qu.: 0.0000          1st Qu.:   2091       
##  Median : 0.000          Median : 0.0000          Median :   7593       
##  Mean   : 4.119          Mean   : 0.3107          Mean   :  16424       
##  3rd Qu.: 3.000          3rd Qu.: 0.0000          3rd Qu.:  18254       
##  Max.   :99.000          Max.   :38.0000          Max.   :1435667       
##                                                                         
##  BankcardUtilization  LenderYield        Investors      
##  Min.   :0.0000      Min.   :-0.0100   Min.   :   1.00  
##  1st Qu.:0.2300      1st Qu.: 0.1242   1st Qu.:   2.00  
##  Median :0.5600      Median : 0.1730   Median :  44.00  
##  Mean   :0.5238      Mean   : 0.1827   Mean   :  80.48  
##  3rd Qu.:0.8200      3rd Qu.: 0.2400   3rd Qu.: 115.00  
##  Max.   :5.9500      Max.   : 0.4925   Max.   :1189.00  
## 

First, we want to see when and how many listings were created.

From the graph above, we can see that the number of loans in Prosper has increased over time. In 2008 there was a collapse in the number of listings, however, since 2009, the number has been constantly growing.

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    1000    4000    6500    8337   12000   35000

Another interesting feature is the amount of loans. In the above graph we can see that most loans are granted for amounts from $4,000 to $12,000. The average loan amount is $8337.

The vast majority of loans is granted for 36 months. Only a fraction of all loans is granted for 12 months.

Consolidation of debts is the most common reason for a loan. We do not have information about the category for a large number of loans - 16,000. The second most common category is Other, followed by Home Improvement and Business.

Most of the loans in the dataset are currently active (56,000). Over 38,000, was paid off. Around 16 thousand were charged off or defaulted.

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  0.0000  0.1340  0.1840  0.1928  0.2500  0.4975

As for the borrower’s rate, it is 0.1928 on average. Although a considerable amount of loans is granted at the borrower’s rate of 0.14, 0.15, 0.18, a lot of loans are granted at a much higher rate, for instance 0.32 (5914 loans).

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##     0.0   131.6   217.7   272.5   371.6  2251.5

The average loan installment is $272. Most loan installments are in the $130 - $370 range.

The largest number of loans is granted to borrowers living in the California.

The share of homeowners in the total number of borrowers is around 50%.

The vast majority of borrowers are currently employed. Only a fraction of borrowers do not have a job or work part-time.

Unfortunately, for the majority of loans, there is no information about the borrower’s profession. A significant part of borrowers are professionals, programmers or executives.

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##       0    3200    4667    5608    6825 1750003

The average monthly income of a borrower is $5608. The vast majority of borrowers earn in the period between $3200 - $6825.

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  0.0000  0.1300  0.2100  0.2552  0.3100 10.0100

It is interesting to compare income to debt. As we can see in the plot above, the average debt to income ratio is 0.2552 and the vast majority of borrowers are in the range of 0.1300 - 0.3100.

Another important aspect when assessing credit risk is the number of delinquencies that the borrower had in recent years. As we can see in the plot above, most borrowers did not have any delinquencies in the last 7 years.

We see similar situations when it comes to public records from the last 10 years. For a significant number of borrowers, it is 0.

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  0.0000  0.2300  0.5600  0.5238  0.8200  5.9500

Utilization of a credit card is also an important factor in risk assessment. As we can see in the plot above, the average credit card utilization is 52%, and most borrowers used between 23% and 82% of the available resources on the credit card.

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##       0    2091    7593   16424   18254 1435667

Another value that can help us analyze the financial status of the borrower is revelving credit balance. As we can see, the average value is $16,424, and most borrowers are in the $2091 - $18254 range.

Based on these data, and several other factors, Prosper assessed the credit risk associated with each borrower. As we can see on the graph above, the most common risk is C, then B. For a large number of borrowers, we do not have any information about the risk, as this information was not collected before 2009.

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    1.00    2.00   44.00   80.48  115.00 1189.00

The average number of investors per loan is 80. Most of the loans have from 1 to 115 investors. The highest number of investors per loan was 1189.

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
## -0.0100  0.1242  0.1730  0.1827  0.2400  0.4925

The lender yield is the interest rate minus the expected fee payments. It is the most important input into any return calculation. As we can see, lender yield is 0.1827 on average. Most of the laons range from 0.1242 to 0.2400.

Univariate Analysis

What is the structure of your dataset?

##          LoanStatus    ListingCreationDate    ClosedDate        
##  Current      :56576   Min.   :2005-11-09   Min.   :2005-11-25  
##  Completed    :38074   1st Qu.:2008-09-19   1st Qu.:2009-07-14  
##  Charged Off  :11992   Median :2012-06-16   Median :2011-04-05  
##  Defaulted    : 5018   Mean   :2011-07-08   Mean   :2011-03-07  
##  Past Due     : 2067   3rd Qu.:2013-09-09   3rd Qu.:2013-01-30  
##  Final Payment:  205   Max.   :2014-03-10   Max.   :2014-03-10  
##  Cancelled    :    5                        NA's   :58848       
##  ListingCategory..numeric.           ListingCategory  Term      
##  Min.   : 0.000            Debt Consolidation:58308   12: 1614  
##  1st Qu.: 1.000            Not Available     :16965   36:87778  
##  Median : 1.000            Other             :10494   60:24545  
##  Mean   : 2.774            Home Improvement  : 7433             
##  3rd Qu.: 3.000            Business          : 7189             
##  Max.   :20.000            Auto              : 2572             
##                            (Other)           :10976             
##   BorrowerRate    LoanOriginalAmount MonthlyLoanPayment BorrowerState     
##  Min.   :0.0000   Min.   : 1000      Min.   :   0.0     Length:113937     
##  1st Qu.:0.1340   1st Qu.: 4000      1st Qu.: 131.6     Class :character  
##  Median :0.1840   Median : 6500      Median : 217.7     Mode  :character  
##  Mean   :0.1928   Mean   : 8337      Mean   : 272.5                       
##  3rd Qu.:0.2500   3rd Qu.:12000      3rd Qu.: 371.6                       
##  Max.   :0.4975   Max.   :35000      Max.   :2251.5                       
##                                                                           
##  IsBorrowerHomeowner                    Occupation   
##  Mode :logical       Other                   :28617  
##  FALSE:56459         Professional            :13628  
##  TRUE :57478         Computer Programmer     : 4478  
##                      Executive               : 4311  
##                      Teacher                 : 3759  
##                      Administrative Assistant: 3688  
##                      (Other)                 :55456  
##       EmploymentStatus StatedMonthlyIncome DebtToIncomeRatio
##  Employed     :67322   Min.   :      0     Min.   : 0.0000  
##  Full-time    :26355   1st Qu.:   3200     1st Qu.: 0.1300  
##  Not available: 7602   Median :   4667     Median : 0.2100  
##  Self-employed: 6134   Mean   :   5608     Mean   : 0.2552  
##  Other        : 3806   3rd Qu.:   6825     3rd Qu.: 0.3100  
##  Part-time    : 1088   Max.   :1750003     Max.   :10.0100  
##  (Other)      : 1630                                        
##  ProsperRating..Alpha.  ProsperScore   OpenCreditLines 
##  C      :18345         0      :29084   Min.   : 0.000  
##  B      :15581         4      :12595   1st Qu.: 5.000  
##  A      :14551         6      :12278   Median : 8.000  
##  D      :14274         8      :12053   Mean   : 8.642  
##  E      : 9795         7      :10597   3rd Qu.:12.000  
##  (Other):12307         5      : 9813   Max.   :54.000  
##  NA's   :29084         (Other):27517                   
##  TotalCreditLinespast7years OpenRevolvingAccounts
##  Min.   :  0.00             Min.   : 0.00        
##  1st Qu.: 17.00             1st Qu.: 4.00        
##  Median : 25.00             Median : 6.00        
##  Mean   : 26.59             Mean   : 6.97        
##  3rd Qu.: 35.00             3rd Qu.: 9.00        
##  Max.   :136.00             Max.   :51.00        
##                                                  
##  OpenRevolvingMonthlyPayment CurrentDelinquencies AmountDelinquent  
##  Min.   :    0.0             Min.   : 0.0000      Min.   :     0.0  
##  1st Qu.:  114.0             1st Qu.: 0.0000      1st Qu.:     0.0  
##  Median :  271.0             Median : 0.0000      Median :     0.0  
##  Mean   :  398.3             Mean   : 0.5884      Mean   :   918.6  
##  3rd Qu.:  525.0             3rd Qu.: 0.0000      3rd Qu.:     0.0  
##  Max.   :14985.0             Max.   :83.0000      Max.   :463881.0  
##                                                                     
##  DelinquenciesLast7Years PublicRecordsLast10Years RevolvingCreditBalance
##  Min.   : 0.000          Min.   : 0.0000          Min.   :      0       
##  1st Qu.: 0.000          1st Qu.: 0.0000          1st Qu.:   2091       
##  Median : 0.000          Median : 0.0000          Median :   7593       
##  Mean   : 4.119          Mean   : 0.3107          Mean   :  16424       
##  3rd Qu.: 3.000          3rd Qu.: 0.0000          3rd Qu.:  18254       
##  Max.   :99.000          Max.   :38.0000          Max.   :1435667       
##                                                                         
##  BankcardUtilization  LenderYield        Investors      
##  Min.   :0.0000      Min.   :-0.0100   Min.   :   1.00  
##  1st Qu.:0.2300      1st Qu.: 0.1242   1st Qu.:   2.00  
##  Median :0.5600      Median : 0.1730   Median :  44.00  
##  Mean   :0.5238      Mean   : 0.1827   Mean   :  80.48  
##  3rd Qu.:0.8200      3rd Qu.: 0.2400   3rd Qu.: 115.00  
##  Max.   :5.9500      Max.   : 0.4925   Max.   :1189.00  
## 

Prosper loan data set contains 113,937 loans with 81 variables on each loan. I have selected 29 variables from the original dataset for my analysis.

What is/are the main feature(s) of interest in your dataset?

The most interesting for me is to understand what is the profile of typical peer-to-peer borrower, who fails to pay off the loan, What are the demographics and credit characteristics of the borrower who defaults is past due on the credit.

What other features in the dataset do you think will help support your

investigation into your feature(s) of interest?

Features like lender yield or number of investors might be helpful to understand which features of the borrower affect the increase of investors’ interest, and therefore are desirable, and which are not.

Did you create any new variables from existing variables in the dataset?

Yes, I created new Listing Category variable, with textual values, based on the original ListingCategory..numeric. variable.

Of the features you investigated, were there any unusual distributions?

Did you perform any operations on the data to tidy, adjust, or change the form
of the data? If so, why did you do this?

I have : * grouped various Past Due statuses of the Loan Status variable into one Past Due status, and converted this variable into factor with 7 levels; * converted Listing Creation Date variable to date format; * converted Term to ordered factor with 3 levels; * converted Closed Date variable to date format; * converted Prosper Rating to ordered factor with 8 levels; * converted Prosper Score to ordered factor with 12 levels; * converted State abbreviation to full name and added as new variable; * converted Is Borrower Homeowner to logical type; * replaced NA in Occupation with Not Available; * replaced NA in Employment Status with Not Available;

These changes have been performed to ease plotting.

While building different plots, I have also grouped variables and subset dataset whenever needed.

Bivariate Plots Section

Correlogram shows that there is a strong correlation (>= 0.7) between:

These correlations are not surprising. In my analysis, I plan to focus on the variables that impact borrowers ability to pay off the credit, therefore Loan Orginal Amount and Laon Monthly Payment will definitely be investigated.

## # A tibble: 7 x 3
##   LoanStatus    AvgProsperScore Count
##   <fct>                   <dbl> <int>
## 1 Cancelled                0        5
## 2 Defaulted                1.13  5018
## 3 Charged Off              2.40 11992
## 4 Completed                3.38 38074
## 5 Past Due                 5.06  2067
## 6 Final Payment            5.75   205
## 7 Current                  5.84 56576

Looking at the above plot we can see that the average prosper score is very low for loans that were defaulted or charged off. Interestingly, the prosper score is also low for loans that are already completed and rather high for loans that are past due. The highest prosper score have current loans and those awaiting a final payment.

## # A tibble: 7 x 3
##   LoanStatus    AvgLoanOriginalAmount Count
##   <fct>                         <dbl> <int>
## 1 Cancelled                     1700      5
## 2 Completed                     6189. 38074
## 3 Charged Off                   6399. 11992
## 4 Defaulted                     6487.  5018
## 5 Past Due                      8258.  2067
## 6 Final Payment                 8346.   205
## 7 Current                      10361. 56576

As we can see on the above plot, the average loan amount for charged off and defaulted loans is much lower than for current loans. Interestingly, the average loan amount is aslo lower for completed loans. The average amount of loan for current loans is $10,361.

## # A tibble: 12 x 3
##    ProsperScore AvgLoanOriginalAmount Count
##    <ord>                        <dbl> <int>
##  1 0                            6159. 29084
##  2 1                            4571.   992
##  3 2                            5280.  5766
##  4 3                            7063.  7642
##  5 4                            8402. 12595
##  6 5                            8400.  9813
##  7 6                            9223. 12278
##  8 7                           10097. 10597
##  9 8                           10488. 12053
## 10 9                           10056.  6911
## 11 10                          11743.  4750
## 12 11                          14858.  1456

Another interesting observation we can make is about the relation of the loan amount to the borrower’s score. In the above plot (N/A values are removed) we can see that the higher amounts of the loan are granted to borrowers with the highest rating. This is in line with common sense - larger loans are granted to a more reliable borrower, while smaller loans to those with a greater risk of non-repayment.

## # A tibble: 7 x 3
##   LoanStatus    AvgMonthlyPayment Count
##   <fct>                     <dbl> <int>
## 1 Cancelled                  61.5     5
## 2 Completed                 219.  38074
## 3 Defaulted                 233.   5018
## 4 Charged Off               235.  11992
## 5 Past Due                  276.   2067
## 6 Final Payment             298.    205
## 7 Current                   320.  56576

As we can see in the above plot, the average monthly payment is the highest for current loans, followed shortly by the loans awaiting final payment. Past due loans are in the third position. We can also see that the average monthly loan payment for completed loans is one of the lowest. This is as expected - smaller loans are easier to pay back while larger loans, that more time and effort.

## # A tibble: 7 x 3
##   LoanStatus    AvgMonthlyPayment Count
##   <fct>                     <dbl> <int>
## 1 Cancelled                 2609.     5
## 2 Defaulted                 4367.  5018
## 3 Charged Off               4486. 11992
## 4 Completed                 5325. 38074
## 5 Past Due                  5367.  2067
## 6 Current                   6153. 56576
## 7 Final Payment             6312.   205

As we can see in the above plot, the average stated monthly income is the highest for loans awaiting final payment, followed shortly by the current loans. Past due loans are in the third position, close to Completed loans. Low income is typical for canceled, defaulted and charged-off loans.

It’s not surprising to see that the average debt to income ratio is the highest for defaulted, past due and charged off loans. We can also see it is significantly lower for loans awaiting the final payment and completed.

Bivariate Analysis

Tip: As before, summarize what you found in your bivariate explorations here. Use the questions below to guide your discussion.

Talk about some of the relationships you observed in this part of the

investigation. How did the feature(s) of interest vary with other features in
the dataset? Did you observe any interesting relationships between the other features
(not the main feature(s) of interest)?

An interesting relationship I have observer is that prosper score and loan status are not related as we could expect. Quite a high number of loans with high prosper score is past due. Moreover, the average prospers score for completed loans is much lower than for past due loans.

Another interesting insight is that the average loan amount for borrowers with the lowest score - 0 - is higher than the average loan amount for borrowers with score 1 and score 2. This is surprising, as the risk related to investing in a borrower with score 0 is much higher than investing in a borrower with score 1 or 2.

What was the strongest relationship you found?

The strongest relationship I have found was between debt to income ratio and loan status. Borrowers with the high debt to income ratio were more often defaulting, past due or charged off on their loan.

Multivariate Plots Section

Let’s start from looking at the relation of employment status, debt to income ratio and loan status.

As we can see in the above plot, only a small fraction of the loans belong to the unemployed borrowers. We can see most of the current loans are in the employed and other buckets. Unfortunately, most of the defaulted loans are in the not available bucket. Debt to income ratio is below 1 for most of the loans. We can see however a small pick at the ratio of 10. To get a better understanding of these relations, let’s zoom in to loans with debt to income ratio below the 99 quantile.

Looking at this plot, we can see that there is significantly less Charged off loans in the employed bucket. We can see them however in the full-time bucket. We can also see that defaulted loans are spread rather equally when it comes to the borrower’s debt to income ratio. This is rather surprising, as we would expect borrowers with the highest debt to income ratio the default on their loans more frequently.

Now, let’s see how monthly loan payment relates to employment status and loan status.

As we can see in the above plot, most of the loans have monthly payment below 1500$. The self-employed bucket is quite interesting, as we can see that loans closer to 1000$ then do be charged off or defaulted. To get a better understanding of these relations, let’s zoom in to loans with monthly payment below the 99 quantile.

Monthly loan payment does not seem to have a big impact on the loan status for employed borrowers. We can see only a slight increase in the number of charged off and defaulted loans for the full time and not employed borrowers. For self-employed borrowers, we can see that the number of charged-off or defaulted loans with monthly payment above the $600 is higher than for lower monthly payments.

Another interesting observation we can make is that borrowers who are not employed, retired or work part-time, then to be offered loans with a monthly payment below $400. Majority of loans granted to self-employed borrowers have monthly payment below $600. At the same time, employees and full-time workers get loans with monthly payment below $900.

Let’s now look at the relation of Stated Monthly Income, Monthly Loan Payment and Loan Status.

As we can see, we have some outliers in the Stated Monthly Income, let’s zoom in to 99 quantile.

As we can see for the majority of loans monthly loan payment is below $700 while borrowers stated income is below $10,000. It is interesting to see a lot of charged-off and defaulted loans in the lower left corner of the plot - small monthly payment but a small income. It seems that borrowers with higher income can pay off the loan even if the monthly payment is larger, while borrowers with the low-income struggle to pay off the loan even if the monthly payment is low.

In this plot, we can look closer at the prosper rating, loan original amount and loan status. We can see that most of the completed loans were loans with a small amount. Most of the loans with a high amount are still active. In this plot, we can also clearly see a relation between prosper score and the amount of loan. Borrowers with a low score are granted smaller loans, while those with good rating bigger loans. High Rish and E category borrowers are almost never granted a loan above the $10,000, while we can see that some of the AA, A and B category borrowers have loans above the $25,000.

Lastly, let’s have a look if there is any significant relation between Prosper Rating, Listing Category and Loan Status.

It seems that there is no relation between the listing category and loan status. There is no category that would have significantly more defaulted or charged-off loans.

Multivariate Analysis

Talk about some of the relationships you observed in this part of the

investigation. Were there features that strengthened each other in terms of
looking at your feature(s) of interest? Were there any interesting or surprising
interactions between features?

I noticed that only a small fraction of the loans belong to the unemployed borrowers, most of the current loans belong to the employed, full-time borrowers. At first glance, we could see that employed borrowers have a significantly less Charged off loans, however, these loans appear frequently for full-time working borrowers. Unfortunately, for most of the defaulted loans, we do not have information on the borrower’s employment status, which limits our investigation.

Another interesting finding is that the debt to income ratio is below 1 for most of the loans. We could also see that defaulted loans are spread rather equally when it comes to the borrower’s debt to income ratio. This is rather surprising, as we would expect borrowers with the highest debt to income ratio the default on their loans more frequently.

When it comes to how monthly loan payment relates to employment status and loan status, we could see that monthly loan payment does not seem to have a big impact on the loan status for employed borrowers. We could see only a slight increase in the number of charged off and defaulted loans for the full time and not employed borrowers. However, for self-employed borrowers, we could see that the number of charged-off or defaulted loans with monthly payment above the $600 is higher than for lower monthly payments.

Another interesting observation we made is that borrowers who are not employed, retired or work part-time, then to be offered loans with a monthly payment below $400. Majority of loans granted to self-employed borrowers have monthly payment below $600. At the same time, employees and full-time workers get loans with monthly payment below $900.

The majority of loans monthly loan payment is below $700 while borrowers stated income is below $10,000. It was interesting to see a lot of charged-off and defaulted loans for borrowers with a small monthly payment and a small income. It seems that borrowers with higher income can pay off the loan even if the monthly payment is larger, while borrowers with the low-income struggle to pay off the loan even if the monthly payment is low.

We could also see that most of the completed loans were loans with a small amount. Most of the loans with a high amount are still active. We could also clearly see a relation between prosper score and the amount of loan. Borrowers with a low score are granted smaller loans, while those with good rating bigger loans. High Rish and E category borrowers are almost never granted a loan above the $10,000, while we can see that some of the AA, A and B category borrowers have loans above the $25,000.

Surprisingly, we did not find any relation between the listing category and loan status. There is no category that would have significantly more defaulted or charged-off loans.


Final Plots and Summary

Plot One

Description One

Most of the loans in the dataset are currently active (56,000). Over 38,000, was paid off. Around 16 thousand were charged off or defaulted. Understanding what reasons stay behind the defaulted and charged off loans is very important from the loan safety perspective. Both investors and loan offering companies are highly interested in understanding what factors impact the borrower’s ability to pay off the debt.

Plot Two

Description Two

In this plot, we can look closer at the stated monthly income, monthly loan payment and loan status. It is interesting to see a lot of charged-off and defaulted loans in the lower left corner of the plot - small monthly payment but a small income. It seems that borrowers with higher income can pay off the loan even if the monthly payment is larger, while borrowers with the low-income struggle to pay off the loan even if the monthly payment is low.

Plot Three

Description Three

In this plot, we can look closer at the prosper rating, loan original amount and loan status. We can see that most of the completed loans were loans with a small amount. Most of the loans with a high amount are still active. In this plot, we can also clearly see a relation between prosper score and the amount of loan. Borrowers with a low score are granted smaller loans, while those with good rating bigger loans. High Rish and E category borrowers are almost never granted a loan above the $10,000, while we can see that some of the AA, A and B category borrowers have loans above the $25,000.


Reflection

I have tried to understand if employment status, loan amount, monthly loan payment and monthly income have a significant impact on the borrower’s ability to pay off the loan. The main challenge I have faced with this dataset is that for lots of loans we did not have all information e.g. employment status, listening category was missing. This has significantly limited the investigation and ability do drive conclusions out of data. Gathering more data about the borrowers could help us understand better the relations between loan status and borrower’s profile.